Structural Kld for Cross-variety Speaker Adaptation in Hmm-based Speech Synthesis
نویسندگان
چکیده
While the synthesis of natural sounding, neutral style speech can be achieved using today’s technology, fast adaptation of speech synthesis to different contexts and situations still poses a challenge. In the context of variety modeling (dialects, sociolects) we have to cope with the problem that no standardized orthographic form is available and that existing speech resources for these varieties are rare. We present recent approaches in the field of cross-lingual speaker transformation for HMM-based speech synthesis and propose a method for transforming an arbitrary speaker’s voice from one variety to another one. We apply Kullback-Leibler divergence for data mapping of HMM-states, transfer probability density functions to the decision tree of the other variety and perform speaker adaptation. A method to integrate structural information in the mapping is also presented and analyzed. Subjective listening tests show that the proposed method produces speech of significantly higher quality than standard speaker adaptation techniques.
منابع مشابه
State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
A phone mapping-based method had been introduced for cross-lingual speaker adaptation in HMM-based speech synthesis. In this paper, we continue to propose a state mapping based method for cross-lingual speaker adaptation, where the state mapping between voice models in source and target languages is established under minimum Kullback-Leibler divergence (KLD) criterion. We introduce two approach...
متن کاملAnalysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
In the EMIME project, we developed a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrated two techniques into a single architecture: unsupervised adaptation for HMM-based TTS using word-based large-vocabulary contin...
متن کاملExplorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis
In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...
متن کاملHMM adaptation for child speech synthesis
Hidden Markov Model (HMM)-based synthesis in combination with speaker adaptation has proven to be an approach that is well-suited for child speech synthesis [1]. This paper describes the development and evaluation of different HMM-based child speech synthesis systems. The aim is to determine the most suitable combination of initial model and speaker adaptation techniques to synthesize child spe...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل